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We introduce and analyze a general model of a popu- 
lation evolving over a network of selectively neutral geno- 
types. We show that the population's limit distribution 
on the neutral network is solely determined by the net- 
work topology and given by the principal eigenvector of the 
network's adjacency matrix. Moreover, the average num- 
ber of neutral mutant neighbors per individual is given by 
the matrix spectral radius. This quantifies the extent to 
which populations evolve mutational robustness: the in- 
sensitivity of the phenotype to mutations. Since the aver- 
age neutrality is independent of evolutionary parameters — 
such as, mutation rate, population size, and selective 
advantage — one can infer global statistics of neutral net- 
work topology using simple population data available from 
in vitro or in vivo evolution. Populations evolving on neu- 
tral networks of RNA secondary structures show excellent 
agreement with our theoretical predictions. 



I. INTRODUCTION 

Kimura's contention that a majority of genotypic 
change in evolution is selectively neutral |Q] has 
gained renewed attention with the recent analysis 
of evolutionary optimization methods Pj3l[| and the 
discovery of neutral networks in genotypc-phenotype 
models for RNA secondary structure Jl(],[l4|,^9| and 
protein structure fl]||. It was found that collections 
of mutually neutral genotypes, which are connected 
via single mutational steps, form extended networks 
that permeate large regions of genotype space. In- 
tuitively, a large degeneracy in genotype-phenotype 
maps, when combined with the high connectivity of 
(high-dimensional) genotype spaces, readily leads to 
such extended neutral networks. This intuition is now 
supported by recent theoretical results P, |l3[ , p6| , p8| . 

In in vitro evolution of ribozymes, mutations re- 
sponsible for an increase in fitness are only a small 
minority of the total number of accepted mutations 
pq|. This indicates that, even in adaptive evolu- 



tion, the majority of point mutations is neutral. The 
fact that only a minority of loci is conserved in se- 
quences evolved from a single ancestor similarly in- 
dicates a high degeneracy in ribozymal genotype- 
phenotype maps |2q] . Neutrality is also implicated 
in experiments where RNA sequences evolve a given 
structure starting from a range of different initial 
genotypes 0. More generally, neutrality in RNA and 
protein genotype-phenotype maps is indicated by the 
observation that their structures are much better con- 
served during evolution than their sequences 15|l8t|. 



Given the presence of neutral networks that pre- 
serve structure or function in sequence space, one asks, 
How does an evolving population distribute itself over 
a neutral network? Can we detect and analyze struc- 
tural properties of neutral networks from data on bi- 
ological or in vitro populations? To what extent does 
a population evolve toward highly connected parts of 
the network, resulting in sequences that are relatively 
insensitive to mutations? Such mutational robustness 
has been observed in biological RNA structures p2| , p3[ 
and in simulations of the evolution of RNA secondary 
structure plfl . However, an analytical understanding 
of the phenomenon, the underlying mechanisms, and 
their dependence on evolutionary parameters — such 
as, mutation rate, population size, selection advan- 
tage, and the topology of the neutral network — has 
up to now not been available. 

Here we develop a dynamical model for the evo- 
lution of populations on neutral networks and show 
analytically that, for biologically relevant population 
sizes and mutation rates, a population's distribution 
over a neutral network is determined solely by the 
network's topology. Consequently, one can infer im- 
portant structural information about neutral networks 
from data on evolving populations, even without spe- 
cific knowledge of the evolutionary parameters. Sim- 
ulations of the evolution of a population of RNA se- 
quences, evolving on a neutral network defined with 
respect to secondary structure, confirm our theoretical 
predictions and illustrate their application to inferring 
network topology. 
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II. MODELING NEUTRALITY 

We assume that genotype space contains a neutral 
network of high, but equal fitness genotypes on which 
the majority of a population is concentrated and that 
the neighboring parts of genotype space consist of 
genotypes with markedly lower fitness. The genotype 
space consists of all sequences of length L over a finite 
alphabet A of A symbols. The neutral network on 
which the population moves can be most naturally re- 
garded as a graph G embedded in this genotype space. 
The vertex set of G consists of all genotypes that are 
on the neutral network; denote its size by \G\. Two 
vertices are connected by an edge if and only if they 
differ by a single point mutation. 

We will investigate the dynamics of a population 
evolving on this neutral network and analyze the de- 
pendence of several population statistics on the topol- 
ogy of the graph G. With these results, we will then 
show how measuring various population statistics en- 
ables one to infer G's structural properties. 

For the evolutionary process, we assume a discrete- 
generation selection-mutation dynamics with constant 
population size M . Individuals on the neutral network 
G have a fitness a. Individuals outside the neutral net- 
work have fitnesses that are considerably smaller than 
a. Under the approximations we use, the exact fitness 
values for genotypes off G turn out to be immaterial. 
Each generation, M individuals are selected with re- 
placement and with probability proportional to fitness 
and then mutated with probability /i. These individ- 
uals form the next generation. 

This dynamical system is a discrete-time version 
of Eigen's molecular evolution model |g). Our anal- 
ysis can be directly translated to the continuous-time 
equations for the Eigen model. The results remain 
essentially unchanged. 

Although our analysis can be extended to more 
complicated mutation schemes, we will assume that 
only single point mutations can occur at each repro- 
duction of an individual. With probability /i one of 
the L symbols is chosen with uniform probability and 
is mutated to one of the A — 1 other symbols. Thus, 
under a mutation, a genotype s moves with uniform 
probability to one of the L(A — 1) neighboring points 
in genotype space. 



A. Infinite-Population Solution 

The first step is to solve for the asymptotic distri- 
bution of the population over the neutral network G 
in the limit of very large population sizes. 

Once the (infinite) population has come to equilib- 
rium, there will be a constant proportion P of the 
population located on the network G and a constant 
average fitness (/} in the population. Under selection 



the proportion of individuals on the neutral network 
increases from P to aP/{f). Under mutation a pro- 
portion (v) of these individuals remains on the net- 
work, while a proportion 1 — {v) falls off the neutral 
network to lower fitness. At the same time, a propor- 
tion Q of individuals located outside G mutate onto 
the network so that an equal proportion P ends up 
on G in the next generation. Thus, at equilibrium, we 
have a balance equation: 



P= m W)P + Q. 



(1) 



In general, the contribution of Q to P is negligible. 
As mentioned above, we assume that the fitness a of 
the network genotypes is substantially larger than the 
fitnesses of those off the neutral network and that the 
mutation rate is small enough so that the bulk of the 
population is located on the neutral network. More- 
over, since their fitnesses are smaller than the average 
fitness (/), only a fraction of the individuals off the 
network G produces offspring for the next generation. 
Of this fraction, only a small fraction mutates onto the 
neutral network G. Therefore, we neglect the term Q 
in Eq. and obtain: 



1. 



(2) 



This expresses the balance between selection expand- 
ing the population on the network and deleterious mu- 
tations reducing it by moving individuals off. 

Under mutation an individual located at genotype 
s of G with vertex degree d s (the number of neutral 
mutant neighbors) has probability 
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to remain on the neutral network G. If asymp- 
totically a fraction P s of the population is located 
at genotype s, then (v) is simply the average of 
v s over the asymptotic distribution on the network: 
(v) = ^2 S £qV s Ps/P- As Eq. (||) shows, the aver- 
age (v) is simply related to the population neutrality 
(d) = J^secdsPs/P- Moreover, using Eq. (||) we can 
directly relate the population neutrality (c?) to the av- 
erage fitness (/): 



(d)=L(A-l) 



H<7 



(4) 



Despite our not specifying the details of G's topol- 
ogy, nor giving the fitness values of the genotypes lying 
off the neutral network, one can relate the population 
neutrality (d) of the individuals on the neutral net- 
work directly to the average fitness (/) in the popu- 
lation. It may seem surprising that this is possible at 
all. Since the population consists partly of sequences 
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off the neutral network, one expects that the aver- 
age fitness is determined in part by the fitnesses of 
these sequences. However, under the assumption that 
back mutations from low-fitness genotypes off the neu- 
tral network onto G are negligible, the fitnesses of se- 
quences outside G only influence the total proportion 
P of individuals on the network, but not the average 
fitness in the population. 

Equation (||) shows that the population neutral- 
ity (d) can be inferred from the average fitness and 
other parameters — such as, mutation rate. However, 
as we will now show, the population neutrality (d) can 
also be obtained independently, from knowledge of the 
topology of G alone. 

The asymptotic equilibrium proportions {P s } of the 
population at network nodes s are the solutions of the 
simultaneous equations: 
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where [s]g is the set of neighbors of s that are also 
on the network G. Using Eq. (||), Eq. (||) can be 
rewritten as: 



(d)P s = [G-P) s , (6) 
where G is the adjacency matrix of the graph G: 

G = { 1 1 6 ^ G ' (7) 
st 1 otherwise. ^ ' 

Since G is nonnegative and the neutral network G is 
connected, the adjacency matrix is irreducible. There- 
fore, the theorems of Frobenius-Perron for nonneg- 
ative irreducible matrices apply |T^| . These imply 
that the proportions P s of the limit distribution on 
the network are given by the principal eigenvector of 
the graph adjacency matrix G. Moreover, the pop- 
ulation neutrality is equal to G's spectral radius p: 
(d) = p. In this way, one concludes that asymptoti- 
cally the population neutrality (d) is independent of 
evolutionary parameters (fi, L, a) and of the fitness 
values of the genotypes off the neutral network. It 
is a function only of the neutral network topology as 
determined by the adjacency matrix G. 

This fortunate circumstance allows us to consider 
several practical consequences. Note that knowledge 
of /u, (j, and (/) allows one to infer a dominant fea- 
ture of G's topology, namely, the spectral radius p 
of its adjacency matrix. In in vitro evolution experi- 
ments in which biomolecules are evolved (say) to bind 
a particular ligand [^0[, by measuring the proportion 
iy) of functional molecules that remain functional af- 
ter mutation, we can now infer the spectral radius p 
of their neutral network. In other situations, such as 
in the bacterial evolution experiments of Ref. (8|, it 
might be more natural to measure the average fitness 



(/} of an evolving population and then use Eq. (|J) to 
infer the population neutrality (d) of viable genotypes 
in sequence space. 



B. Blind and Myopic Random Neutral Walks 

In the foregoing we solved for the asymptotic av- 
erage neutrality (d) of an infinite population under 
selection and mutation dynamics and showed that it 
was uniquely determined by the topology of the neu- 
tral network G. To put this result in perspective, we 
now compare the population neutrality (d) with the ef- 
fective neutralities observed under two different kinds 
of random walk over G. The results illustrate infor- 
mative extremes of how network topology determines 
the population dynamics on neutral networks. 

The first kind of random walk that we consider is 
generally referred to as a blind ant random walk. An 
ant starts out on a randomly chosen node of G. Each 
time step it chooses one of its L(A — 1) neighbors at 
random. If the chosen neighbor is on G, the ant steps 
to this node, otherwise it remains at the current node 
for another time step. It is easy to show that this 
random walk asymptotically spends equal amounts of 
time at all of G's nodes Therefore, the network 
neutrality d of the nodes visited under this type of 
random walk is simply given by: 
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(8) 



It is instructive to compare this with the effec- 
tive neutrality observed under another random walk, 
called the myopic ant. An ant again starts at a ran- 
dom node s £ G. Each time step, the ant determines 
the set [s]g of network neighbors of s and then steps to 
one at random. Under this random walk, the asymp- 
totic proportion P s of time spent at node s is propor- 
tional to the node degree d s jf?} ■ It turns out that the 
myopic neutrality d seen by this ant can be expressed 
in terms of the mean d and variance Var(d) of node 
degrees over G: 



Var(d) 



(9) 



The network and myopic neutralities, d and d, are 
thus directly given in terms of local statistics of the 
distribution of vertex degrees, while the population 
neutrality (d) is given by p, the spectral radius of G's 
adjacency matrix. The latter is an essentially global 
property of G. 



III. MUTATIONAL ROBUSTNESS 

With these cases in mind, we now consider how dif- 
ferent network topologies are reflected by these neu- 
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tralities. In prototype models of populations evolv- 
ing on neutral networks, the networks are often as- 
sumed to be or are approximated as regular graphs 



0,||j28H|,||. If the graph G is, in fact, regular, 
each node has the same degree d and, obviously, one 
has (d) = d = d = d. 

In more realistic neutral networks, one expects G's 
neutralities to vary over the network. When this oc- 
curs, the population neutrality is typically larger than 
the network neutrality: (d) = p > d. This difference 
quantifies precisely the extent to which a population 
seeks out the most connected areas of the neutral net- 
work. Thus, a population will evolve a mutational 
robustness that is larger than if the population were 
to spread uniformly over the neutral network. Addi- 
tionally, the mutational robustness tends to increase 
during the transient phase in which the population 
relaxes towards the its asymptotic distribution. 

Assume, for instance, that initially the population 
is located entirely off the neutral network G at lower 
fitness sequences. At some time, a genotype s € G is 
discovered by the population. To a rough approxima- 
tion, one can assume that the probability of a geno- 
type s being discovered first is proportional to the 
number of neighbors, L(A — 1) — d s , that s has off the 
neutral network. Therefore, the population neutral- 
ity (do) when the population first enters the neutral 
network G is approximately given by: 



(d )=d- 



Var(d) 



L(A —l) — d 



(10) 



Therefore, we define the excess robustness r to be the 
relative increase in neutrality between the initial neu- 
trality and (asymptotic) population neutrality: 



(d) ~ (do) 
(do) ' 



(11) 



For networks that are sparse, i.e. d <S L(A — 1), this 
is well approximated by r i=a ((d) — d)/d. Note that, 
while r is defined in terms population statistics, the 
preceding results have shown that this robustness is 
only a function of G's topology and should thus be 
considered a property of the network. 



IV. FINITE-POPULATION EFFECTS 

Our analysis of the population distribution on the 
neutral network G assumed an infinite population. 
For finite populations, it is well known that sampling 
fluctuations converge a population and this raises a 
question: To what extent does the asymptotic dis- 
tribution P s still describe the distribution over the 
network for small populations? As a finite popula- 
tion diffuses over a neutral network JTsj ] , one might 
hope that the time average of the distribution over 



G is still given by P s . Indeed, the simulation results 
shown below indicate that for moderately large pop- 
ulation sizes, this seems to be the case. However, a 
simple argument shows that this cannot be true for 
arbitrarily small populations. 

Assume that the population size M is so small that 
the product of mutation rate and population size is 
much smaller than 1; i.e. Mp, < 1. In this limit the 
population will, at any point in time, be completely 
converged onto a single genotype s on the neutral net- 
work G. With probability Mp, a single mutant will be 
produced at each generation. This mutant is equally 
likely to be one of the L(A — 1) neighbors of s. If this 
mutant is not on G, it will quickly disappear due to 
selection. However, if the mutant is on the neutral 
network, there is a probability X/M that it will take 
over the population. When this happens, the popula- 
tion will effectively have taken a random- walk step on 
the network, of exactly the kind followed by the blind 
ant. Therefore, for Mp <C 1, the population neutral- 
ity will be equal to the network neutrality: (d) = d. In 
this regime, r ~ and excess mutational robustness 
will not emerge through evolution. 

The extent to which the initial population neutral- 
ity approaches (d) is determined by the extent to 
which evolution on G is dominated by sampling fluctu- 
ations. In neutral evolution, population conv ergen ce is 
generally only a function of the product Mp |5|, |23| , |3(| . 
Thus, as the product Mp ranges from values much 
smaller than 1 to values much larger than 1, we pre- 
dict that the population neutrality (d) shifts from the 
network neutrality d to the infinite-population neu- 
trality, given by G's spectral radius p. 



V. RNA EVOLUTION ON STRUCTURALLY 
NEUTRAL NETWORKS 

The evolution of RNA molecules in a simulated flow 
reactor provides an excellent arena in which to test the 
theoretical predictions of evolved mutational robust- 
ness. The replication rates (fitnesses) were chosen to 
be a function only of the secondary structures of the 
RNA molecules. The secondary structure of RNA is 
an essential aspect of its phenotype, as documented by 
its conservation in evolution |l5[ and the convergent 
in vitro evolution toward a similar secondary struc- 
ture when selecting for a specific function 0. RNA 
secondary structure prediction based on free energy 
minimization is a standard tool in experimental biol- 
ogy and has been shown to be reliable, especially when 
the minimum free energy structure is thermodynami- 
cally well defined [^0| . RNA secondary structures were 
determined with the Vienna Package Jig ], which uses 
the free energies from ]34j| . Free energies of dangling 
ends were set to 0. 

The neutral network G on which the population 
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evolves consists of all RNA molecules of length L = 18 
that fold into a particular target structure. A target 
structure (Fig. ||) was selected that contains sufficient 
variation in its neutrality to test the theory, yet is not 
so large as to preclude an exhaustive analysis of its 
neutral network topology. 




FIG. 1. The target RNA secondary structure. 

Using only single point mutations per replication, 
purine-pyrimidine base pairs {G-C, G-U, A-U} can 
mutate into each other, but not into pyrimidine-purine 
{C-G, U-G, U-A} base pairs. The target structure 
contains 6 base pairs which can each be taken from 
one or the other of these two sets. Thus, the approxi- 
mately 2 x 10 8 sequences that are consistent with the 
target's base pairs separate into 2 6 = 64 disjoint sets. 
Of these, we analyzed the set in which all base pairs 
were of the purine-pyrimidine type and found that it 
contained two neutral networks of 51,028 and 5, 169 
sequences that fold into the target structure. Simula- 
tions were performed on the largest of the two. The 
exhaustive enumeration of this network showed that 
it had a network neutrality of d — 12.0 with standard 
deviation yJVar(d) 3.4, a maximum neutrality of 
d s = 24, and a minimum of d s = 1. The spectral 
radius of the network's 51028 x 51028 adjacency ma- 
trix was p ~ 15.7. The theory predicts that, when 
Mfi 3> 1, the population neutrality should converge 
to this value. 

The simulated flow reactor contained a population 
of replicating and mutating RNA sequences [pQ. The 
replication rate of a molecule depends on whether its 
calculated minimum free energy structure equals that 
of the target: Sequences that fold into the target struc- 
ture replicate on average once per time unit, while 
all other sequences replicate once per 10 time units 
on average. During replication the progeny of a se- 
quence has probability fj, of a single point mutation. 
Selection pressure in the flow reactor is induced by 
an adaptive dilution flow that keeps the total RNA 
population fluctuating around a constant capacity M. 

Evolution was seeded from various starting se- 
quences with either a relatively high or a relatively low 
neutrality. Independent of the starting point, the pop- 
ulation neutrality converges to the predicted value, as 
shown in Fig. 0. 

Subsequently, we tested the finite-population effects 
on the population's average neutrality at several dif- 
ferent mutation rates. Figure || shows the dependence 
of the asymptotic average population neutrality on 



population size M and mutation rate jj,. As expected, 
the population neutrality depends only on the product 
Mp, of population size and mutation rate. For small 
Mfj, the population neutrality increases with increas- 
ing M/x, until M[i k, 500 where it saturates at the 
predicted value of (d) s=s 15.7. Since small populations 
do not form a stationary distribution over the neutral 
net, but diffuse over it fl9|| , the average population 
neutrality at each generation may fluctuate consid- 
erably for small populations. Theoretically, sampling 
fluctuations in the proportions of individuals at differ- 
ent nodes of the network scale inversely proportional 
to the square root of the population size. We there- 
fore expect the fluctuations in population neutrality to 
scale as the inverse square root of the population size 
as well. This was indeed observed in our simulations. 



<d> 




200 



800 



1000 



400 600 
Generations 

FIG. 2. The evolution of RNA mutational robustness: 
convergence of the population's average neutrality to the 
theoretical value, (d) = p ~ 15.7, predicted by G's spec- 
tral radius (upper dashed line). The network's average 
neutrality d is the lower dashed line. Simulations used a 
population size of M = 10 4 and mutation rates of fj, = 0.5 
and fi = 0.1 per sequence. They were started at sequences 
with either a relatively large number of neutral neighbors 
(d 3 — 24) (upper curves for each mutation rate) or a small 
number (d s — 5) (lower curves). 

Finally, the fact that r ~ 0.31 for this neutral net- 
work shows that under selection and mutation, a pop- 
ulation will evolve a mutational robustness that is 31 
percent higher than if it were to spread randomly over 
the network. 



VI. CONCLUSIONS 

We have shown that, under neutral evolution, a 
population does not move over a neutral network in 
an entirely random fashion, but tends to concentrate 
at highly connected parts of the network, resulting in 
phenotypes that are relatively robust against muta- 
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tions. Moreover, the average number of point muta- 
tions that leave the phenotype unaltered is given by 
the spectral radius of the neutral network's adjacency 
matrix. Thus, our theory provides an analytical foun- 
dation for the intuitive notion that evolution selects 
genotypes that are mutationally robust. 
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FIG. 3. Dependence of the average neutrality in 
the population on mutation rate fi and population 
size M. Simulations used three mutation rates, 
fi 6 {0.5,0.1,0.01}, and a range of population sizes, 
M G {10000, 5000, 1000, 500, 250, 100, 50, 20}. The results 
show that the evolved neutrality in the population de- 
pends on the product Mfi of population size and mutation 
rate. Neutrality increases with increasing Mfi and satu- 
rates when Mfi > 500. When Mfj, < 1 population neutral- 
ity approximates G's average neutrality d sa 12.0. When 
Mfi > 500 population neutrality converges to the spectral 
radius of the network's adjacency matrix, p » 15.7. 

Perhaps surprisingly, the tendency to evolve toward 
highly connected parts of the network is independent 
of evolutionary parameters — such as, mutation rate, 
selection advantage, and population size (as long as 
Mfi ^> 1) — and is solely determined by the network's 
topology. One consequence is that one can infer prop- 
erties of the neutral network's topology from simple 
population statistics. 

Simulations with neutral networks of RNA sec- 
ondary structures confirm the theoretical results and 
show that even for moderate population sizes, the pop- 
ulation neutrality converges to the infinite-population 
prediction. Typical sizes of in vitro populations are 
such that the data obtained from experiments are ex- 
pected to accord with the infinite-population results 
derived here. It seems possible then to devise in vitro 
experiments that, using the results outlined above, 
would allow one to obtain information about the topo- 
logical structure of neutral networks of biomolecules 
with similar functionality. 

We will present the extension of our theory to cases 
with multiple-mutation events per reproduction else- 



where. We will also report on analytical results for a 
variety of network topologies that we have studied. 

Finally, here we focused only on the asymptotic dis- 
tribution of the population on the neutral network. 
But how did the population attain this equilibrium? 
The transient relaxation dynamics, such as that shown 
in Fig. |^, can be analyzed in terms of the nonprincipal 
eigenvectors and eigenvalues of the adjacency matrix 
G. Since the topology of a graph is almost entirely de- 
termined by the eigensystem of its adjacency matrix, 
one should in principle be able to infer the complete 
structure of the neutral network from accurate mea- 
surements of the transient population dynamics. 
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